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ABSTRACT 


A  tutorial  is  presented  encompassing  both  biological  and  mathematical  aspects  of 
associative  memory  for  pattern  processing.  A  systems  viewpoint  is  adopted  whereby 
biological  associative  memory  is  viewed  as  a  system  of  adaptive  filters,  with  the  free 
parameters  of  the  filter  corresponding  to  the  strengths  of  the  biological  neural  connections. 
Certainly  such  viewpoint  is  not  intended  to  accurately  depict  the  true  mechanisms 
underlying  the  extraordinary  capabilities  of  biological  associative  memory  —  fast  pattern 
recognition  and  apparently  infinite  memory  capacity.  For  such  mechanisms  will  unlikely  be 
discovered  in  the  absence  of  tools  allowing  the  observance  of  collective  behavior  over 
systems  of  neurons.  However,  the  viewpoint  does  serve  to  integrate  both  mathematics  and 
biology  on  a  general  level. 

Of  most  significance  is  perhaps  the  systematic  treatment  of  mathematical  associative 
memory.  In  the  adaptive  filter  framework,  associative  memory  is  described  and  compared 
to  traditional  statistical  techniques.  Also,  new  insight  into  the  generalization  capability  of 
associative  memory  is  expressed.  Conditions  are  presented  to  ensure  both  correct  memory 
recall  and  significant  generalization.  1  p*  o-.  «•  i 
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ASSOCIATIVE  MEMORY 
Biological  and  Mathematical  Aspects 


1.  INTRODUCTION 

The  recent  interest  in  associative  memory  can  perhaps  be  attributed  to  the  increasing 

awareness  of  the  extraordinary,  yet  subtle,  capabilities  of  biological  associative  memory. 

11  12 

With  only  finite  number  of  components  and  degrees  of  freedom  (10  -  10  neurons 

3 

with  average  10  connections  per  neuron)  humans  can  store  and  recall  seemingly  infinite 
memories.  In  contrast,  classical  computer  systems  utilizing  address  memory  can  only  store 
memories  which  increase  linearly  with  the  number  of  components.  Moreover,  correct 
computer  memory  recall  requires  exact  address  specification.  Yet  the  biological  memory 
implements  recall  by  association.  That  is,  a  stored  memory  pattern  can  be  recalled  by  mere 
association  with  an  incomplete  excitation  pattern  (key.)  Thus  by  association,  inexact 
information  often  results  in  correct  memory  recall,  and  thereby  constitutes  a  robust  memory 
recall  mechanism. 

Further  astonishing  is  the  physical  realization  of  biological  associative  memory  with 
slow  and  apparently  inaccurate  components.  For  example,  the  biological  signal  channels 
(axons)  are  several  orders  of  magnitude  slower  and  more  passive  that  the  analogous 
computer  circuitry.  Specifically  the  resistance  of  one  meter  of  nerve  fiber  is  approximately 
the  resistance  of  lo'°  miles  of  22  gauge  copper  wire.  And  at  a  snail's  pace  (100  m/sec) 
the  signals  are  propagated  through  the  axon,  compared  to  the  blazing  speed  of  light 
achieved  by  electrical  signaling  in  computers.  Yet  the  time  required  to  recall  a  stored 
memory  (or  equivalently  recognize  a  pattern)  by  association  is  only  approximately  100 
msec  for  biological  memory,  while  the  conventional  digital  computer  (with  nanosecond 
processors  capable  of  performing  tens  of  millions  of  instructions  per  second)  requires 
minutes  to  perform  the  same  task. 

So  how  does  biological  memory,  characterized  by  seemingly  infinite  memory 
capacity  and  quick  recollection  emerge  from  slow,  noisy,  and  imprecise  biomass  circuitry? 
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Such  question  has  driven  research  activity  for  decades.  And  even  today,  the  question 
remains  largely  unanswered. 

However  physiological  experiments  have  revealed  fundamental  differences  between 
biological  and  computer  memory,  which  are  likely  to  contribute  to  the  vast  processing  gap. 
Perhaps  the  most  apparent  contrast  is  that  biological  memory,  in  addition  to  being 
associative,  is  distributive.  That  is,  the  memory  function  and  hence  pattern  recognition  is 
spatially  distributed  over  numerous  neurons,  rather  than  confined  to  a  single  location  as  in 
computer  memory.  This  distributed  phenomenon  is  believed  to  be  responsible  for  the 
inherent  fault-tolerance  properties,  in  which  memory  often  remains  intact  after  minor 
damage.  Furthermore  the  distribution  of  the  processing  over  numerous  neurons  provides  a 
parallel  processing  capability,  thought  to  underly  the  quick  system  response  times  achieved 
with  relatively  slow  circuitry. 

Consequently,  in  building  robust  pattern  recognition  systems  with  increased 
memory  capacity,  distributed  associative  memories  patterned  after  biological  neural 
networks  are  subject  of  much  investigation.  The  purpose  of  this  chapter  is  to  provide  a 
tutorial  encompassing  both  biological  and  mathematical  aspects  of  associative  memory. 
The  nature  is  systematic  and  follows  the  work  of  Kohonen  [1].  First,  a  brief  section  on 
biological  associative  memory  is  included  to  provide  a  perspective  on  neural  circuitry 
involved  in  memory;  certainly  not  an  explanation  of  the  true  mechanisms  underlying  human 
memory.  Next  associative  memory  is  defined  mathematically  with  various  models 
presented  and  characterized.  Then  criteria  arc  defined  for  assisting  in  the  evaluation  of  such 
models.  Concluding,  possible  future  research  activity  is  mentioned  regarding  the 
development  of  systems  much  more  characteristic  of  their  biological  counterparts. 


2.  BIOLOGICAL  ASSOCIATIVE  MEMORY 


Although  many  theories  of  biological  memory  exist,  scientists  continue  to  seek  a 
true  understanding  of  the  mechanisms  which  underlie  the  extraordinary  capabilities  of 
human  associative  memory.  Here  a  systems  viewpoint  of  biological  memory  is  adopted, 
due  in  part  to  the  physiological  evidence  and  admittedly,  the  author’s  engineering 
background. 

As  with  any  large  system,  overall  or  collective  behavior  arises  from  the  various 
functional  components;  and  to  understand  the  collective  system  behavior,  the  components 
must  first  be  somewhat  understood.  Therefore,  the  discussion  begins  with  describing  the 
functional  components  of  biological  memory.  These  functional  components  are  viewed  as 
adaptive  filters,  whose  adaptive  elements  are  represented  biologically  by  synapses  (variable 
connections  between  brain  cells  (neurons)).  Then,  the  physical  organization  of  biological 
memory  is  discussed  within  such  context. 


2.1  System  Viewpoint  of  Biological  Memory 

Increasing  evidence  suggests  the  extreme  complexity  of  the  brain  is  not  due  to 
randomness,  but  instead,  arises  from  highly  ordered  design  [2]  ;  a  design  which  couples 
many  distinct  neural  regions,  each  being  tuned  for  specific  stimuli  processing.  These 
distinct  neural  regions  include  the  visual  cortex  for  vision  and  the  somatosensory  cortex  for 
tactile  sensing.  Within  such  regions  there  exists  smaller  subregions,  again  for  specific 
function.  In  all,  the  brain  can  be  considered  as  being  composed  of  thousands  of  distinct 
specialized  regions,  each  containing  thousands  of  subregions.  Consequently  these 
subregions,  composed  of  10  to  1000  neurons,  may  be  interpreted  as  the  functional  building 
blocks  or  components  of  the  overall  system. 

Certainly  through  experimentation,  biological  memory  has  been  established  as  a 
collective  phenomenon,  distributed  over  many  such  neural  regions.  This  is  confirmed  by 
experiments  where  lesions  are  made  in  different  brain  localities,  and  observing  the  resulting 
impairment  being  a  function  of  the  severity  of  the  lesion,  and  not  so  much  the  location 
[3,4],  Thus  biological  memory  is  a  system,  composed  of  numerous  neural  regions  which 
act  collectively  to  yield  an  associative  memory  with  extraordinary  capabilities. 


Subsequently,  these  neural  regions  comprising  memory  can  be  considered  as 
adaptive  filters,  whose  adaptive  elements  give  rise  to  the  distinct  properties  characteristic  of 
each  region  (see  Fig.  1.).  And  the  role  of  biological  memory,  or  analogously  the  system  of 
adaptive  filters,  is  to  create  an  internal  model  to  represent  sensory  environmental  history. 


Fig.  1.  Simple  system  of  adaptive  filters. 

Since  a  sensory  event  consists  of  thousands  of  activation  patterns  from  possibly 
many  sensors,  a  good  deal  of  preprocessing  is  presumed  to  facilitate  the  storage  of 
apparently  infinite  events.  The  preprocessing  in  turn  yields  higher  level  information  or 
features.  The  features  comprising  this  feature  set  or  map  are  then  believed  to  be  encoded 
through  association.  Thus  instead  of  memorizing  the  explicit  feature  patterns,  rather  the 
associations  between  the  features  are  preserved,  likely  by  the  neural  synapses.  Therefore  if 
an  input  key  representing  only  a  portion  of  the  feature  set  is  presented  to  the  system, 
through  association  the  complete  memory  event  is  recalled  [1].  Now  given  the  analogy  of 
biological  memory  with  a  system  of  adaptive  filters  for  extracting  and  associating  key 
features,  the  adaptive  filter  is  discussed  within  the  biological  context. 


2.2  Adaptive  Filter  Paradigm 


The  adaptive  filter  representing  a  subregion  of  neurons  is  shown  in  Fig.  2.  The 
amount  of  neurons  represented  by  each  filter  coincides  with  the  amount  of  neurons 
necessary  to  comprise  a  neural  subregion  with  observable  collective  behavior,  and  hence 
likely  to  be  a  functional  entity.  The  inputs  and  outputs  are  multidimensional  signals  derived 
from  action  potentials  of  nearby  neurons.  Most  of  the  signal  information  is  conveyed  by 
the  frequency  of  the  impulse  train  (#  impulses/sec)  [5]  and  the  location  of  the  filter  in  which 
the  signal  terminates.  Almost  no  information  is  contained  in  the  amplitude  of  the  impulses, 
for  the  amplitude  is  fairly  constant  throughout  many  animal  species.  The  signal  origination 
and  destination  is  of  extreme  importance  in  semantic  interpretation.  This  is  demonstrated 
by  realizing  that  the  same  electrical  impulse  train  directed  to  the  visual  cortex  and  the 
auditory  cortex  produces  profoundly  different  meaning. 


Fig.  2.  Adaptive  filter  with  transfer  function  F  . 

The  filter  itself  is  represented  mathematically  by  a  transfer  function  F  relating  the 
outputs  to  the  inputs.  The  function  can  be  dependent  upon  the  inputs,  outputs,  and  the 
adaptive  filter  elements.  The  nature  of  the  adaptive  filter,  and  hence  the  neural  network,  is 
the  subject  of  the  Mathematical  Associative  Memory  section.  As  with  any  adaptive  filter, 
the  specification  of  the  adaptive  elements  or  parameters  gives  rise  to  the  filter's  identity. 
Correspondingly,  the  next  subsection  investigates  the  biological  equivalents  to  the  adaptive 
elements,  the  synapses. 
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2.3  Adaptive  Filter  Elements  -  The  Synapses 

Many  neurobiologists  believe  the  unique  character  of  individual  human  beings, 
including  disposition,  learning,  and  memory  resides  in  the  geometry  and  specific  strengths 
(weights)  of  the  neural  interconnections  or  synapses  [6].  The  modification  of  these 
synapses  is  believed  to  be  of  primary  importance  both  in  learning  and  associative  memory, 
and  their  strengths  are  viewed  as  the  adaptive  elements  of  the  filters  (see  Fig.  3.) 
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Fig.  3. a)  Neural  connection,  b)  signal  transmission  across  synaptic  channel  c)  transmitted 
and  received  signals. 
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Typically,  modifications  in  connection  geometry,  such  as  sprouting  new  output 
channels  (axons),  occur  at  an  early  stage  of  development.  While  in  the  adult  stage,  most 
interconnection  modifications  for  memory  and  learning  are  conducted  by  varying  the 
synaptic  strength.  This  change  in  synaptic  strength  alters  the  transmission  of  the  action 
potential  from  the  presynaptic  neuron  through  the  transmission  channel  to  the  postsynaptic 
neuron.  The  transmission  channel  between  the  communicating  neurons  supports  chemical 
signaling.  Specifically,  at  the  arrival  of  an  action  potential,  molecules  of  chemical 
transmitter  from  the  presynaptic  neuron  are  released  into  the  channel  and  received  by  the 
receptor  molecules  residing  in  the  postsynaptic  neuron.  Much  debate  arises  concerning 
whether  the  presynaptic  or  postsynaptic  neuron  (or  both)  is  physically  responsible  for  the 
change  in  synaptic  strength. 

Nevertheless,  the  synaptic  strengths,  analogous  to  the  adaptive  filter  elements,  are 
adapted  during  both  learning  and  memorization.  And  each  synapse  with  a  specific  strength 
performs  a  weighting  of  the  respective  input  signal  to  the  postsynaptic  neuron  (see  Fig. 
3c.)  Thus  the  neuron  is  often  modeled  as  a  device  which  first  performs  a  weighted 

summation  over  the  input  signals  (where  weight  w..  designates  the  synaptic  strength  from 
neuron  i  to  neuron  j  )  then  passes  the  result  through  a  threshold  function  [7]  (see  Fig. 
4.).  Upon  specifying  a  rule  for  synaptic  modification  (adaptive  filter  algorithm)  and 
geometry,  the  simplified  biological  model  becomes  complete. 


t  (threshold) 


Fig.  4.  Simple  neuron  model. 


A  well  recognized  synaptic  modification  rule  basically  states  that  the  synaptic 
strength  changes  in  proportion  to  the  correlation  of  the  activity  of  the  presynaptic  and 
postsynaptic  neurons  [8].  Mathematically,  the  rule  is  expressed 

{%>£{>*.} 

Two  types  of  synapses  are  distinguished.  The  excitatory  synapse  promotes  the  firing  of 
the  postsynaptic  cell  (  w ..  >  0  )  while  the  inhibitory  synapse  reduces  the  chance  of  firing 
(  w.  <  0  ). 

The  time  constants  for  these  biological  adaptive  elements  are  surprisingly  fast. 
Experimental  results  have  shown  that  brief  periods  (seconds)  of  stimulation  to  neural 
regions  known  to  be  involved  in  memory  alters  the  synaptic  strength  for  a  substantial 
amount  of  time,  thereby  supporting  the  notion  of  memory  being  correlated  with  synaptic 
modification.  These  results  imply  that  only  moderate  training  is  needed  to  produce  lasting 
synaptic  modification  to  support  learning  and  memory  [6,9].  The  implication  being,  as  you 
read  this  article  your  synapses  are  being  modified  accordingly  (depending  upon  your 
attention  level.) 

2.4  Biological  Memory  Organization 

As  mentioned,  the  brain  is  a  highly  organized  system.  Signal  processing  for 
memory,  as  well  as  other  functions,  is  often  conducted  in  a  layered  fashion.  In  some 
cases,  these  layers  are  distributed  in  planar  arrays,  each  layer  being  a  neural  subregion  with 
observable  collective  behavior  (thus  comprising  a  functional  entity)  as  illustrated  in  Fig.  5. 
The  distinct  functionality  of  the  layers  is  demonstrated  by  electrophysiological  recordings, 
whereby  neurons  in  the  same  layer  respond  similarly  and  have  similar  receptive  fields 
(areas  which  influence  the  activity  of  a  given  neuron.)  The  classical  results  of  Hubei  and 
Wiesel  [10]  clearly  demonstrates  this  layered  processing  phenomenon,  specifically  from  the 
retina  to  the  visual  cortex.  Here  the  processing  progresses  from  detecting  light  to  detecting 
complex  geometries. 


Particularly  for  associative  memory,  many  portions  of  the  brain  contribute  in  the 
overall  processing.  However  from  lesion  experiments,  memory  is  believed  to  be 
decomposable  into  various  stages.  Such  physiological  evidence  suggests  that  for  memory 
storage,  both  an  encoder  and  physical  storage  medium  (likely  the  synapses)  are  involved 
(see  Fig.  6.). 


Fig.  5.  Layers  of  neural  subregions  and  corresponding  adaptive  filter  system. 

The  encoder  is  deemed  necessary  to  reduce  the  redundancy  from  the  overwhelming 
number  of  patterns  to  be  associatively  stored.  Evidence  to  support  the  source  encoder 
theory  is  provided  through  brain  surgery,  originally  in  an  attempt  to  correct  epileptic 
seizures.  The  corrective  surgery  involved  removing  the  temporal  lobes,  which  included  the 
hippocampus  and  amygdala.  Although  the  epilepsy  was  cured,  the  patient  now  without  a 
hippocampus,  was  unable  to  store  new  information  in  long  term  memory.  Yet  previous 
long  term  memory  (stored  prior  to  the  surgery)  remained  intact  [11].  Thus  theories 
describing  the  hippocampus  as  a  source  encoder  necessary  for  long  term  storage  received 
credibility. 


In  fact  some  have  theorized  that  the  hippocampus  may  actually  be  a  self-organizing 
source  encoder  which  maximizes  mutual  information  between  the  inputs  (ensuring 
transformation  invertibility)  while  also  minimizing  the  mutual  information  between  the 
output  channels  (ensuring  minimally  redundant  output  signals)  [12,13].  Here  the 
processing  is  fairly  localized  and  layered,  composed  of  a  minimal  three  layers. 

Although  the  encoding  may  be  fairly  localized,  the  actual  storage  is  much  more 
distributed.  The  storage  is  believed  to  be  distributed  amongst  the  synaptic  connections 
throughout  the  cerebral  neocortex,  which  amounts  to  70%  of  the  human  brain. 
Furthermore,  the  incredible  capability  of  human  memory,  being  vastly  superior  to  any  other 
animal,  is  believed  due  to  the  substantially  larger  human  cerebrum. 


Fig.  6.  Components  of  memory  storage 


With  regard  to  the  biological  mechanisms  of  associative  memory  recall,  much  less 
is  known  and  no  attempt  is  made  here  to  postulate  a  theory  . 


3.  MATHEMATICAL  ASSOCIATIVE  MEMORY 


Recall  the  particular  system  viewpoint  of  biological  memory  envisions  the  memory 
process  being  distributed  over  numerous  regions  of  neurons,  each  region  in  turn  composed 
of  subregions  represented  by  adaptive  filters  whose  elements  are  adaptively  chosen.  With 
such  viewpoint,  the  adaptive  filter  is  thus  seen  as  the  system  level  building  block.  And  the 
purpose  of  this  section  is  to  mathematically  describe  some  of  these  building  blocks. 


3.1  Definition 

A  single  adaptive  filter  for  modeling  associative  memory  is  shown  in  Fig.  7.  The 

N  L 

input  vector  x.  e  SR  represents  a  prototypical  key,  while  the  output  vector  y  e  SR  is 

the  corresponding  memory.  Note  the  input  and  output  vectors  of  finite  dimension  can  also 

represent  continuous  time  processes,  since  any  finite  energy  signal  x{t)  can  be  expanded 

by  orthogonal  functions  yielding 
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(1) 


The  filter  is  characterized  by  the  transfer  function  F  ,  dependent  upon  both  the  input  and 
the  filter  parameters  or  elements.  The  filter  is  designated  adaptive  if  the  matrix  of  elements 
remains  dependent  upon  the  data,  that  is 


(2) 


However  for  brevity,  the  notation  below  is  adopted  to  signify  the  adaptive  filter  transfer 
function 
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Fig.  7.  Adaptive  filter  with  transfer  function  F  . 


3.2  Objective 


The  ideal  objective  of  mathematical  associative  memory  by  adaptive  filtering  is  to 
construct  a  transfer  function  in  an  adaptive  (often  recursive  or  iterative)  manner  such  that 


(i) 

F(*i)  =  yi  1  =  1, 2,..  m 

-  perfect  recall 

(4) 

(ii) 

+  «)  =  ”  =  perturbation  ) 

*  generalization 

(5) 

given  a  set  of  M  arbitrary  paired  associations 

{(■v  y^^xr  y2^’  •••  •(*  M  ’  y M 

(6) 

Isa 


Pictorially,  the  ideal  associative  memory  with  both  perfect  recall  and  significant 
generalization  capability  is  depicted  in  Fig.  8.  Perfect  recall  is  shown  by  thin  lines  mapping 
the  values  in  the  input  space  X  to  the  corresponding  correct  memories  in  the  output  space 
V  .  The  generalization  capability  represents  the  amount  of  perturbation  or  error  tolerated  in 
the  input  key.  Thus  good  generalization  implies  proper  recall  when  excited  by  an 
erroneous  (yet  similar)  key,  as  shown  with  the  bold  line.  Physically  the  erroneous  keys 
may  represent  an  incomplete  memory  item,  or  a  noisy  version  of  the  proper  key  x.  . 
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Fig.  8.  Associative  memory  mapping. 

Associative  memories  can  be  categorized  into  two  classes.  Hetero-associative 
memories  involve  associations  of  dissimilar  type.  That  is,  the  input  and  output  vectors 
have  entirely  different  meaning,  as  in  the  above  formulation.  An  example  of  hetero- 
associative  memory  is  the  classification  problem  where  objects  x.  are  to  be  classified  into 
one  of  L  categories.  Here  y.  =  (0, 0,..  0,  1,  (L.  0) ,  conveying  object  x.  is  categorized 
into  class  i  .  Conversely,  auto-associative  memories  have  associations  of  similar  type. 
Specifically  y.  =  x.  in  the  above  formulation.  Here  the  actual  input  key  (x.  +  n  )  is 
just  a  perturbation  of  the  memory  x.  (see  Fig.  9.). 

The  appeal  of  the  adaptive  filter  associative  memory  is  the  ability  of  the  filter  to 
adapt  (synonymously  learn,  self-organize)  in  an  effort  to  improve  performance,  whereby 
by  the  adaption  is  determined  by  the  data,  and  hence  data  driven.  Thus  the  elements  of  T 
specifying  the  function  mapping  the  input  keys  to  the  memories  are  determined 
automatically.  In  consequence,  the  function  F  relating  inputs  to  outputs  is  learned  by  the 
adaptive  filter  (with  an  appropriate  learning  algorithm),  and  thereby  alleviates  the  laborious 
tasks  often  necessary  in  extracting  explicit  relationships,  and  the  requirement  of  a-priori 
models. 


Fig.  9.  Example  of  hetero-associative  and  auto-associative  memories 

3.3  Architecture 

Further  investigation  of  the  adaptive  filter  function  F  requires  specifying  a 
particular  class  of  functions.  In  turn,  a  class  of  possible  functions  can  be  defined  in  terms 
of  a  given  architecture.  Correspondingly,  the  architecture  which  supports  the  class  of 
adaptive  filter  functions  immediately  investigated  is  shown  in  Fig.  10.  Here  the  adaptive 
filter  transfer  function  is  dependent  upon  the  processors  (shown  by  circles),  the  bias  values 
(  i .  },  and  the  connections  with  strengths  {  tj/c  }  (  ^  =  strength  (weight)  of  connection 
from  input  to  processor  j  ).  Thus  the  adaptive  elements  are  contained  in  the  matrix 
T  =  upon  specifying  both  a  rule  for  the  adaption  of  T  ,  bias  values,  and  the 

mathematical  form  of  the  processors,  the  filter  is  completely  specified. 

The  analogy  of  the  adaptive  filter  architecture  in  Fig.  10  with  a  region  of  neurons 
becomes  apparent  if  the  weighted  connections  are  seen  as  the  synapses  of  varying  strength, 
and  the  processors  consists  of  summations  followed  by  nonlinear  threshold  activation 
functions  (refer  to  Fig.  4.).  And  to  construct  many  regions  of  neurons  to  represent  a 
system,  simply  repeat  the  above  principle  architecture. 


Fig.  10.  Adaptive  Filter  Architecture 


3.4  Linear  Associative  Memory 

To  provide  a  sound  foundation  for  further  associative  memory  investigation,  the 
optimal  linear  associative  memory  model  is  first  presented.  Often  fundamental 
relationships  obtained  in  the  linear  case  provide  insight  into  the  nonlinear  cases.  And  as  a 
further  justification  for  examining  linear  models,  they  typically  perform  adequately  when 
operating  within  the  bounds  discussed. 

The  linear  architecture  is  easily  obtained  by  designating  the  processors  in  Fig.  10. 
as  mere  summations.  Hence  the  output  component  yi}  becomes 

N 

y..  =  X  t  x .,  +  d 
v  ,  ,  )k  lk  J 

k  =  l  (7) 

or  equivalently  in  vector  form 

y.  =  F  (x.  )  =  fx.  +  d  ... 

*  VI/  I  (g) 

Also,  the  architecture  can  be  drawn  to  emphasize  the  linear  matrix  formulation,  termed  the 
leammatrix  [14,15,16],  shown  in  Fig.  11. 


In  general  (where  exact  solutions  may  not  exist)  the  solution  providing  the  best 
recall  in  the  sense  ofleast  squares  (  jX  |jr*  -  y-t  |j  j  minimized)  is  given  by 


T  =  YX 


—  i 

where  X  is  the  pseudo- in  verse  (17]. 


Exact  solutions  are  obtained  in  the  cases  where  linear  independence  occurs. 
Specifically  three  cases  are  detailed  below. 

A)  Prototype  keys  { x.  }  are  linearly  independent  (=$  M  <  N  ). 

Since  the  keys  are  linearly  independent,  X  consists  of  linearly  independent 
columns,  and  hence  (x  *x~  )  exists  yielding 


therefore 


TX  =  Y  (x  *X  )  (x  *X  ) 

(f(xY)'V]x 


f  S  y(x*x  )  X* 


and  perfect  recall  is  assured  (*  denotes  transpose).  Notice  also  that  since  the 
N 

x.  e  91  are  linearly  independent  at  most  N  memories  can  be  perfectly  recalled, 
or  equivalently,  the  number  of  memories  must  be  less  than  or  equal  to  the 
dimension  of  the  key  vector  ( M  <  N  ) . 


Consequently,  when  the  linear  associative  memory  is  overloaded  (M  >  N  )  and 
the  rows  of  X  are  linearly  independent,  the  memory  recalled  is  best  in  the  sense  of 
least  squared  error,  and  is  reminiscent  of  traditional  optimal  linear  regression  and 
estimation  (18,21)  where  the  number  of  data  pairs  {( »  y,)}  exceeds  the  vector 
dimensionality. 

These  linear  cases  are  illustrated  in  Fig.  12  with  M  =  6  pairs  of  single  dimensional 
(N  =  1)  associations.  For  perfect  recall,  only  1  arbitrary  memory  can  be  stored 
(xl’  yl) .  As  shown,  the  compromise  for  memory  overloading  is  inexact  recall, 
represented  by  the  discrepancy  (dashed  line)  between  the  true  memory  y .  and  the  linear 
mapping  F(x.)  . 


12.  Comparison  of  linear  regression,  estimation,  and  associative  memory. 


C)  Prototype  keys  {  x,  }  are  linearly  independent  and  orthonormal  (=$  M  <  N  )  - 
Suppose  the  prototype  keys  are  mutually  orthogonal  unit  vectors.  Then  by 
definition 


Thus  perfect  recall  is  ensured  provided  the  prototype  keys  are  orthogonal.  Without 
orthogonality,  crosstalk  noise  is  mixed  (24)  with  the  true  memory,  thereby 
contributing  to  erroneous  recall.  Orthogonality  is  often  achieved  by  conventional 
Gram-Schmidt  orthogonalization.  Consequently,  the  prototype  keys  associated 
with  stored  memories  and  the  arbitrary  input  patterns  are  often  preprocessed  to 
enhance  orthogonality. 

In  consequence,  the  conditions  for  both  perfect  recall  and  best  approximate  recall 
have  been  established  for  linear  associative  memory.  However  to  satisfy  the  ideal 
objective,  both  generalization  and  adaptability  must  be  addressed.  Generalization  is  treated 
in  the  next  section,  while  the  matrix  of  elements  constructed  by  (14,16,18,  or  23)  is 
generally  made  adaptive  according  to  the  recursion 

*k-l  +  ~  -  lx*  )* k  (25) 

where  T k  is  the  new  adaptive  element  matrix  formed  from  the  recent  matrix  utilizing  (k-1) 
data  pairs  {(V  >-,)}.  Formulas  for  the  gain  vector  for  the  cases  addressed  are 
contained  in  [1], 


3.5  Generalization  Capability  of  Linear  Associative  Memory 


Generalization  in  the  context  of  associative  memory  is  the  ability  to  recall  the  correct 
memory  when  excited  with  an  incorrect  or  perturbed  input  (key).  Typical  perturbations 
may  include  missing  input  elements,  random  noise,  or  perhaps  in  vision,  variations  which 
often  prohibit  correct  identification.  Mathematically,  generalization  can  be  viewed  as  the 
ability  to  map  all  perturbations  contained  in  a  neighborhood  about  a  prototype  key  x.  to 
the  correct  output  memory  (see  Fig.  13).  Here  ideal  generalization  corresponds  to 
maximum  input  perturbation  radius  (without  overlapping  neighborhoods)  and  minimum 
output  perturbation  radius  rQ  .  Now  upon  specifying  a  class  of  perturbations, 
generalization  capability  can  then  be  formulated  in  terms  of  the  parameters  which  influence 
the  growth  of  these  neighborhoods. 


Fig.  13.  Function  generalization  with  input  neighborhood  of  perturbation  radius  being 
mapped  within  output  neighborhood  of  radius  rQ  . 


Many  perturbations  arising  in  the  mapping  of  a  prototype  key  x.  to  the  correct 
memory  y.  can  be  modeled  by 

F(«(ii))  =  >* ;  (26) 

where  g( )  is  the  perturbation  function,  and  y.  is  the  resulting  output  given  the  perturbed 
input.  Ideally,  the  output  y'.  would  be  the  correct  memory  y. .  However  depending  on 
the  functional  form  of  the  perturbation,  correct  memory  recall  may  be  impossible.  For  an 
introductory  treatment,  only  random  noise  perturbations  will  be  examined  of  the  form 

s(i  )=£,  +  »-  .  (27) 

Consequently,  the  objective  of  this  section  is  to  determine  quantitatively  the  relationships 
which  influence  the  generalization  capability  of  linear  associative  memories.  The  treatment 
begins  with  conservatively  relating  the  acceptable  amount  of  input  noise  perturbation  (  rj ) 

to  the  minimum  distance  between  the  input  prototype  keys.  Next,  the  output  perturbation 

22 


J  <  J  f-»X- f-.  •  Av *  • 


neighborhoods  (r0 )  are  shown  to  be  dependent  upon  how  close  to  capacity  the  memory  is 
being  operated. 


Consider  the  noise  perturbation  model  (27)  where  the  perturbation  n  is  a  zero 

2 

mean  random  vector  of  variance  o  and  normalized  (in  energy)  to  the  dimension  N 


The  perturbed  input  is  then  treated  as  a  prototype  key  corrupted  with  additive  noise  (27). 

Although  the  noise  elements  n  may  be  symmetrically  distributed  (for  example 

multivariate  Gaussian),  the  resultant  noise  vector  n  tends  to  lie  near  the  surface  of  a 

sphere  with  radius  a.  (Specifically  p(||n|  -  cr2  £  a|)  <  [18]).  Thus  for 

N  A 

typical  associative  memories  (where  N  >  100  ),  the  perturbations  encountered  tend  to  be 
concentrated  on  a  spherical  surface  centered  at  the  prototype  key,  with  radius  equivalent  to 
the  noise  standard  deviation  (see  Fig.  14..)  This  apparent  concentration  of  noise,  due  to 
large  dimensional  spaces,  is  encountered  in  communication  theory  and  termed  sphere 
hardening  [18]. 


Fig.  14.  Sphere  hardening  at  prototype  key  x.  . 


In  a  practical  setting,  the  perturbation  variance  is  rarely  known.  However  an  upper 

2 

limit  can  often  be  specified.  Thus  if  the  variance  can  be  bound  by  °max,  then  the 


perturbation  is  likely  to  be  concentrated  on  a  sphere  within  radius  °max .  Consequently  the 
minimum  distance  between  any  two  prototype  keys  must  be  greater  than  the  maximum 
perturbation  standard  deviation  to  avoid  (with  high  probability)  unresolvable  recall  errors. 


.  -  min  d(x.,  x .  \  >  + 

nun  .  .  \  i  J )  V  max 

*  »  J 


A  =  rw 


Notice  an  unresolvable  recall  error  occurs  when  the  perturbed  input  : t.  +  n  lies  on 
another  prototype  key  *  •  ,  and  hence  y j  is  incorrectly  recalled  (see  Fig.  15.).  Clearly 
the  best  representation  for  the  prototype  keys  {■*,• }  would  be  one  which  maximizes  the 
separation  amongst  the  keys  in  the  input  space,  thereby  accommodating  the  largest  amount 
of  random  perturbation. 


ST 


> 

Fig.  15.  Unresolvable  recall  error  occurring  for  max  min  . 


The  second  relation  involves  the  growth  of  the  output  perturbation  neighborhood  as 
a  function  of  how  close  to  capacity  the  memoiy  is  being  operated.  The  output  perturbation 
neighborhood  about  memory  y.  is  that  region  mapped  into  the  output  space  arising  from 
inputs  within  the  region  about  x.  of  radius  .  (Mathematically,  the  neighborhood  is 
written  =  >  =  •  *  e  j. )  ideally,  the  output  perturbation 


neighborhoods  are  desired  small  as  possible,  therefore  many  input  keys  representing 
perturbations  of  the  true  key  would  be  mapped  very  near  the  correct  memory. 


To  obtain  the  relationship,  consider  the  general  linear  associative  memory  with  a 
perturbed  input  as  shown. 


x  ~  x  +  n 
J 


y  =  T  OF) 


Fig.  16.  Perturbation  example. 


Recall  the  general  solution  T  =  YX  ,  hence 


y  =  YX+x 


=  YX  (xx  x  )  def.  of  pseudo 


inverse 


=  YX  [XX  x  +  XX  n  J 


=  YX 


(*, +  *) 


where  n  =  XX  n  is  the  projection  of  the  noise  vector  n  onto  the  space  spanned  by  the 
prototype  keys  {*,}.  Now  the  variance  of  the  norm  of  the  effective  noise  n  is  [1] 

=  Var(llnll)  =  -^-||n  II  f31) 


and  thus  the  noise  term  is  attenuated  by  y/ M  /  N  on  the  average  when  mixing  with  the 
signal  that  represents  perfect  recall  or  the  best  approximate  recall,  depending  upon  the  cases 
previously  stated.  Therefore,  to  combat  the  deleterious  effects  of  input  perturbations 
(potentially  causing  large  deviations  in  the  recalled  memory  pattern  from  the  true  memory) 
the  number  of  memories  stored  is  to  be  kept  much  smaller  than  the  vector  dimension 
(M  <  <  N  )  implying  operating  the  memory  well  under  capacity  as  illustrated  in  Fig.  17. 


In  summary,  generalization  is  dependent  upon  the  parameters  which  control  the  size 
of  the  perturbation  neighborhoods.  Ideally,  the  input  perturbations  neighborhoods  are 
desired  large,  while  the  output  neighborhoods  small.  For  linear  associative  memory,  the 
maximum  input  perturbation  neighborhood  is  limited  by  the  distance  between  the  prototype 
keys.  While  the  output  perturbation  regions  are  driven  small  by  operating  the  memory  well 
below  memory  capacity,  thereby  attenuating  the  perturbation  by  the  square  law  (31). 
Several  demonstrations  of  generalization  with  both  random  noise  and  missing  fragment 
perturbations  are  contained  in  [1,19]. 

And  for  general  associative  memory,  generalization  typically  follows  from  selecting 
a  representation  where  the  associated  data  pairs  {(■*;•  ?i)}  or  clusters,  are  well  separated 
in  their  respective  spaces  and  the  memory  is  operated  well  within  the  memory  capacity. 

3.6  Nonlinear  Associative  Memory 

Several  limitations  accompany  the  linear  associative  memories  described  in  the 

f  x  \ 

previous  section.  Fundamentally  since  the  mapping  is  linear,  input  prototype  keys  i  »  J 
which  are  close  together  in  the  domain  X  must  be  mapped  close  together  in  the  range  Y  . 
Thus  linear  associative  memories  will  not  suffice  where  similar  prototype  keys  need  be 
mapped  to  dissimilar  output  memories.  Furthermore,  the  linear  memories  ignore  contextual 
information  which  is  believed  to  be  of  primary  importance  in  biological  memory,  enabling 
selective  recall  amongst  seemingly  infinite  memories.  Moreover,  often  burdensome  is  the 
preprocessing  requirement  (to  obtain  linearly  independent  prototype  keys)  necessary  to 
achieve  perfect  recall.  Together  with  the  low  memory  capacity  (#  memories  ^  vector 
dimension)  reduced  further  for  good  generalization,  linear  associative  memories  leave  much 
to  be  desired  in  contrast  to  biological  memory. 

In  an  attempt  to  counter  some  of  the  mentioned  limitations,  nonlinear  associative 
memories  have  been  proposed  and  several  are  discussed  in  detail  in  the  following  chapters. 
For  perspective,  some  of  the  models  are  briefly  summarized  below. 

The  Hopfield  model  [20]  is  an  auto-associative  memory  with  feedback,  comprised 
of  a  linear  model  (specifically  the  outer  product  technique,  case  C)  followed  by  a  nonlinear 
threshold  function.  Although  the  iterative  process  converges,  spurious  memories  often 
result  [21]  and  the  memory  capacity  is  low  [22]. 


The  Grossberg  model  [23]  creates  it's  own  memories  depending  upon  the  degree  of 
similarity  desired  by  the  modeler  (vigilance  parameter)  between  the  keys  and  respective 
memories.  Since  the  network  is  allowed  to  grow  as  needed  to  represent  the  memories, 
comparing  the  memory  capacity  of  such  model  to  the  previous  models  with  fixed 
architectures  in  inappropriate. 

The  Poggio  associative  memory  model  is  optimal  amongst  memories  of  matrix 
polynomial  form  [24].  Although  the  memory  is  nonlinear  in  the  input  key  vector,  the 
adaptable  parameters  determined  from  the  least  squares  criteria  are  linear,  and  thus  easily 
calculated. 


In  the  associative  net  by  Willshaw  [25),  the  memory  capacity  is  increased  at  the 
expense  of  restricting  the  form  of  the  binary  input  and  output  vectors.  For  maximal 
information  storage,  the  number  of  ones  in  the  vector  keys  and  memories  are  to  be 


log  N  .  Asa  consequence,  a  sparse  connection  matrix  (approximately  half  zeros)  of 


binary  switches  is  formed  yielding  a  memory  capacity  I  M  =  N 
linear  associative  memory. 


>  In  3  2  V 

In  2  N  ) 


which  exceeds 


The  perceptron  by  Rosenblatt  [26]  is  a  nonlinear  hetero-associative  memory  which 
is  rather  limited  in  the  class  of  mappings  which  can  be  learned  [27].  However  by 
cascading  several  layers  of  the  basic  architecture  or  filter  (see  Fig.  10.),  and  incorporating 
one  of  several  multilayer  learning  algorithms  [28-32],  the  multilayer  perceptron  becomes 
capable  of  learning  much  more  complex  mappings. 

Overall,  the  key  to  devising  a  good  associative  memory  lies  with  the  expressivity  of 
the  network  architecture.  That  is,  the  larger  the  class  of  functions  the  network  can  realize, 
the  more  arbitrary  the  data  pairs  («*})  can  be.  For  if  an  associative  memory  could 
be  constructed  capable  of  mapping  arbitrary  functions,  then  any  arbitrary  set  of  keys  and 
memories  could  be  associated  with  perfect  recall  performance.  Although  no  such  universal 
associative  memory  has  been  practically  constructed,  Kolmogorov  has  proved  existence 
[33].  And  even  more  striking,  the  Kolmogorov  architecture  somewhat  resembles  a 
biological  neural  network  per  description  below. 


First  realize  that  a  sufficient  condition  for  perfect  recall  performance  amongst 
arbitrary  data  pairs  is  for  the  associative  memory  architecture  to  be  capable  of  realizing 
arbitrary  functions.  (This  is  easily  demonstrated  in  a  single  dimension  example  by 
considering  the  data  pairs  plotted  in  Fig.  18  Perfect  recall  is  ensured 
(  y i  -  F  (■*,•)  ^  1 )  provided  the  architecture  can  express  any  function  F  which 
intersects  all  points.) 
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Fig.  18.  Perfect  recall  attained  by  an  associative  memory  realizing  function  F. 

Now  Kolmogorov's  theorem  states  that  an  arbitrary  continuous  function  of  many  variables 
can  be  realized  by  a  finite  superposition  of  continuous  functions,  each  dependent  upon  only 
a  single  variable,  accordingly 

2N  +  1  r  N 

=  X  G.  T 


where  M.  ire  fixed  continuous  increasing  functions  and  3.  are  continuous  functions 
dependent  on  F (  .  Thus  in  the  vector  format 


each  output  component  y{  (dependent  upon  many  elements  of  x  )  is  expressed  according 
to  (32),  yielding  a  vector  of  Kolmogorov  representations.  The  accompanying  network 
architecture  for  each  output  realization  y{  is  shown  in  Fig.  19a.).  Notice  the  similarity  of 
the  architecture  to  neural  network  models,  (compare  Fig.  4.)  whereby  the  functions  W 
represent  nonlinear  threshold  functions  amongst  neurons  comprising  the  first  layer,  and 
Gj  being  the  nonlinear  threshold  functions  for  the  second  layer  neurons.  However,  the 
architecture  is  biologically  implausible  since  all  synaptic  strengths  are  of  equal  magnitude. 


Later  Lorentz  [34]  and  Sprecher  [35]  extended  the  results  of  Kolmogorov  to  yield 
the  architecture  shown  in  Fig.  19b).  Notice  the  use  of  N  connection  weights  offset  the 
stringent  requirement  of  N  (2N  +1)  threshold  functions  on  the  first  level  of  neurons. 
Again  the  architecture  is  also  biologically  unlikely  since  the  same  synaptic  strengths  are 
repeated  to  each  of  the  neurons  in  second  layer. 


Following  the  trend  of  compromising  connection  weights  for  nonlinear  threshold 
functions,  the  question  remains  as  to  whether  an  arbitrary  continuous  function  of  many 
variables  can  be  represented  as  a  finite  superposition  of  single  variable  functions  with  the 
more  biologically  realistic  architecture  of  Fig.  19c.  For  if  so,  this  mathematical  architecture 
would  prove  invaluable  towards  understanding  the  vast  capabilities  of  biological  associative 
memory,  as  well  as  providing  principles  for  constructing  associative  memories  of  far 
greater  capacity. 
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3.7  Evaluation  of  Associative  Memories 

Due  to  the  vast  variability  in  the  network  architectures  employed  in  associative 
memories,  developing  an  evaluation  protocol  can  be  difficult.  This  section  briefly 
describes  a  general  set  of  criteria  likely  to  be  instrumental  in  evaluating  various  associative 
memories  for  application. 

The  evaluation  can  be  partitioned  according  to  the  storage  and  recall  operations. 
First  for  memory  storage,  the  memory  capacity  is  of  extreme  importance.  Such  capacity  is 
usually  expressed  as  an  upper  bound  on  the  number  of  memories  which  can  be  reliably 
recalled.  Another  storage  parameter  is  the  actual  efficiency  of  the  memory  storage,  being 
expressed  as  the  number  of  reliable  memories  stored  per  architecture  size  and  complexity. 
Learning  efficiency  expressed  as  the  amount  of  computation  (number  of  iterations)  required 
to  store  a  benchmark  memory  set  is  also  likely  to  be  an  important  storage  evaluation 
parameter. 

Secondly  in  regard  to  recall  evaluation,  most  important  are  speed  and  accuracy. 
The  accuracy  can  be  evaluated  by  simply  determining  the  sum  squared  error  resulting  from 
comparing  the  true  memories  to  the  associative  memory  outputs  under  key  prototype 
excitation.  Recall  efficiency  or  speed  entails  the  amount  of  time  (computation)  required  to 
recall  a  memory  given  an  input  key.  Finally,  the  generalization  capability  can  be  examined 
by  determining  the  maximum  perturbation  neighborhood  the  associative  memory  can 
tolerate  under  reliable  recall. 

Both  simulation  and  analytical  approaches  to  determining  such  evaluation  criteria 
can  be  employed.  Analytical  approaches  for  the  linear  associative  memories  as  displayed 
herein  are  straightforward.  However  for  nonlinear  memories,  often  stochastic  approaches 
are  used  taking  advantage  of  large  sample  properties  invoked  for  networks  with  large 
numbers  of  neurons.  Monte  Carlo  simulations  also  can  provide  evaluation  parameters  and 
often  with  much  less  effort.  Especially  if  benchmark  memory  sets  and  examples  are 
established,  evaluation  by  simulation  may  become  routine.  For  illustration,  memory 
capacity  bounds  obtained  analytically  [22],  and  by  simulation  [20]  are  shown  in  Fig.  20. 
for  some  associative  memories. 


VECTOR  DIMENSION  N 


Fig.  20.  Memory  capacity  for  some  associative  memories 

Finally  like  any  evaluation,  the  best  associative  memory  is  the  candidate  with  the 
most  favorable  evaluation  results  for  those  attributes  most  crucial  for  the  application. 
Therefore  many  types  of  associative  memories  with  distinct  favorable  attributes  are 
envisioned  to  be  applied. 


4.  CONCLUSION 


Clearly  the  mathematical  models  of  associative  memory  discussed  possess  certain 
biological  memory  characteristics,  such  as  pattern  recognition  by  association,  distributed 
parallel  processing,  and  generalization.  Moreover  the  models,  through  small  scale 
applications,  seem  to  substantiate  the  ever  prevailing  wisdom  acknowledging  the  suitability 
of  neural-like  architectures  for  associative  memory  pattern  recognition  and  other  random 
problems  where  algorithmic  solutions  do  not  exist  [36].  However  these  architectures 
developed  through  decades  of  research  remain  distant  in  performance  to  their  biological 
counterparts,  exhibiting  seemingly  infinite  memory  capacity  and  fast  recognition. 

Ironically,  perhaps  the  mechanism  responsible  for  the  extraordinary  capability, 
namely  distributive  parallel  processing,  may  well  be  the  barrier  which  prohibits  man  from 
truly  understanding  the  origin  of  capability.  For  memory,  along  with  other  brain  functions, 
are  collective  phenomenon,  distributed  over  vast  neuronal  regions.  And  in  the  absence  of 
techniques  which  enable  investigation  of  brain  function  on  a  collective  or  systems  basis, 
principles  underlying  such  extraordinary  capabilities  may  never  be  uncovered,  nor  realized. 

In  conclusion,  realizing  the  vast  differences  in  pattern  processing  (speech 
recognition,  image  understanding,  decision  making...)  amongst  computers  and  biological 
systems,  future  research  is  likely  to  be  chartered  to  discovering  collective  principles 
underlying  biological  information  processing.  And  in  the  process,  a  sufficient 
understanding  may  be  gained  offering  insight  towards  the  development  of  specialized 
neuronal  architectures,  borrowing  from  both  biological  and  physical  sciences,  for  assisting 
man  in  learning  and  problem  solving. 
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